NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PredTOP: Latency Predictor Utilizing DAG Transformers for Distributed Deep Learning Training with Operator Parallelism

https://doi.org/10.1109/IPDPS64566.2025.00069

Acharya, Dipak; Shu, Tong (June 2025, IEEE)

Free, publicly-accessible full text available June 3, 2026
Exploration of TPU Architectures for the Optimized Transformer in Drainage Crossing Detection

https://doi.org/10.1109/BigData62323.2024.10826077

Nazeri, Amirhossein; Godwin, Denys W; Maria_Panteleaki, Aikaterini; Anagnostopoulos, Iraklis; Edidem, Michael I; Li, Ruopu; Shu, Tong (December 2024, IEEE)

Full Text Available
AM-DGCNN: Leveraging Graph Attention Networks and Edge Attributes for Link Classification in Knowledge Graphs

https://doi.org/10.1109/SCW63240.2024.00144

Pandey, Dhroov; Shu, Tong (November 2024, IEEE)

Full Text Available
HIOS: Hierarchical Inter-Operator Scheduler for Real-Time Inference of DAG-Structured Deep Learning Models on Multiple GPUs

https://doi.org/10.1109/CLUSTER52292.2023.00016

Kundu, Turja; Shu, Tong (October 2023, IEEE)

Neural-network-enabled data analysis in real-time scientific applications imposes stringent requirements on inference latency. Meanwhile, recent deep learning (DL) model design trends to replace a single branch with multiple branches for high prediction accuracy and robustness, which makes interoperator parallelization become an effective approach to improve inference latency. However, existing inter-operator parallelization techniques for inference acceleration are mainly focused on utilization optimization in a single GPU. With the data size of an input sample and the scale of a DL model ever-growing, the limited resource of a single GPU is insufficient to support the parallel execution of large operators. In order to break this limitation, we study hybrid inter-operator parallelism both among multiple GPUs and in each GPU. In this paper, we design and implement a hierarchical inter-operator scheduler (HIOS) to automatically distribute large operators onto different GPUs and group small operators in the same GPU for parallel execution. Particularly, we propose a novel scheduling algorithm, named HIOS-LP, which consists of inter-GPU operator parallelization through iterative longest-path (LP) mapping and intra-GPU operator parallelization based on a sliding window. In addition to extensive simulation results, experiments with modern convolutional neural network benchmarks demonstrate that our HIOS-LP outperforms the state-of-the-art inter-operator scheduling algorithm IOS by up to 17% in real systems.
more » « less
Accuracy-Constrained Efficiency Optimization and GPU Profiling of CNN Inference for Detecting Drainage Crossing Locations

https://doi.org/10.1145/3624062.3624260

Zhang, Yicheng; Pandey, Dhroov; Wu, Di; Kundu, Turja; Li, Ruopu; Shu, Tong (November 2023, ACM)

The accurate and efficient determination of hydrologic connectivity has garnered significant attention from both academic and industrial sectors due to its critical implications for environmental management. While recent studies have leveraged the spatial characteristics of hydrologic features, the use of elevation models for identifying drainage paths can be influenced by flow barriers. To address these challenges, our focus in this study is on detecting drainage crossings through the application of advanced convolutional neural networks (CNNs). In pursuit of this goal, we use neural architecture search to automatically explore CNN models for identifying drainage crossings. Our approach not only attains high accuracy (over 97% for average precision) in object detection but also excels in efficiently inferring correct drainage crossings within a remarkably short time frame (0.268 ms). Furthermore, we perform a detailed profiling of our approach on GPU systems to analyze performance bottlenecks.
more » « less
Pareto Optimization of CNN Models via Hardware-Aware Neural Architecture Search for Drainage Crossing Classification on Resource-Limited Devices

https://doi.org/10.1145/3624062.3624258

Li, Yuke; Baik, Jiwon; Rahman, Md Marufi; Anagnostopoulos, Iraklis; Li, Ruopu; Shu, Tong (November 2023, ACM)

Embedded devices, constrained by limited memory and processors, require deep learning models to be tailored to their specifications. This research explores customized model architectures for classifying drainage crossing images. Building on the foundational ResNet-18, this paper aims to maximize prediction accuracy, reduce memory size, and minimize inference latency. Various configurations were systematically probed by leveraging hardware-aware neural architecture search, accumulating 1,717 experimental results over six benchmarking variants. The experimental data analysis, enhanced by nn-Meter, provided a comprehensive understanding of inference latency across four different predictors. Significantly, a Pareto front analysis with three objectives of accuracy, latency, and memory resulted in five non-dominated solutions. These standout models showcased efficiency while retaining accuracy, offering a compelling alternative to the conventional ResNet-18 when deployed in resource-constrained environments. The paper concludes by highlighting insights drawn from the results and suggesting avenues for future exploration.
more » « less
A Comparative Survey: Reusing Small Pre-Trained Models for Efficient Large Model Training

https://doi.org/10.1109/SCW63240.2024.00015

Pandey, Dhroov; Ghebremichael, Jonah; Qi, Zongqing; Shu, Tong (November 2024, IEEE)

Full Text Available
Modeling Lunar Surface Charging Using Physics-Informed Neural Networks

https://doi.org/10.1109/BigData62323.2024.10825168

Zendehdel, Niloofar; Mosharrof, Adib; Delgado, Katherine; Han, Daoru; Liang, Xin; Shu, Tong (December 2024, IEEE)

Full Text Available
A Deep Learning Approach to Maximizing Electrostatic Sieve Efficiency in Regolith Beneficiation

https://doi.org/10.1109/BigData62323.2024.10826123

Vadnerkar, Kalpit M; Eze, Emmanuela Amen; Gautam, Rinoj; Han, Daoru; Liang, Xin; Shu, Tong (December 2024, IEEE)

Full Text Available

Search for: All records